point in time. This expression involves that individual’s predictor values and the regression

coefficients. Next, the software constructs a longer expression that includes the likelihood of

getting exactly the observed survival times for all the participants in the data set. And if this isn’t

already complicated enough, the expression has to deal with the issue of censored data. At this

point, the software seeks to find the values of the regression coefficients that maximize this very

long likelihood expression (similar to the way maximum likelihood is described with logistic

regression in Chapter 18).

Hazard ratios

Hazard ratios (HRs) are the estimates of relative risk obtained from PH regression. HRs in survival

regression play a similar role that odds ratios play in logistic regression. They’re also calculated the

same way from regression output — by exponentiating the regression coefficients:

In logistic regression:

In PH regression:

Keep in mind that hazard is the chance of dying in any small period of time. For each

predictor variable in a PH regression model, a coefficient is produced that — when

exponentiated — equals the HR. The HR tells you how much the hazard rate increases for the

participants positive for the predictor compared to the comparison group when you increase the

variable’s value by exactly 1.0 unit. Therefore, a HR’s numerical value depends on the units in

which the variable is expressed in your data. And for categorical predictors, interpreting the HR

depends on how you code the categories.

For example, if a survival regression model in a study of emphysema patients includes number of

cigarettes smoked per day as a predictor of survival, and if the HR for this variable comes out equal to

1.05, then a participant’s chances of dying at any instant increase by a factor of 1.05 (5 percent) for

every additional cigarette smoked per day. A 5 percent increase may not seem like much, but it’s

applied for every additional cigarette per day. A person who smokes one pack (20 cigarettes) per day

has that 1.05 multiplication applied 20 times, which is like multiplying by

, which equals 2.65.

One pack contains 20 cigarettes, so if you change the units in which you record smoking levels from

cigarettes per day to packs per day, you would use units that are 20 times larger. In that case, the

corresponding regression coefficient is 20 times larger, and the HR is raised to the 20th power (2.65

instead of 1.05 in this example).

And a two-pack-per-day smoker’s hazard increases by a factor of 2.65 over a one-pack-per-day

smoker. This translates to a

increase (approximately sevenfold) in the chances of dying at any

instant for the smoker compared to a nonsmoker.

Executing a Survival Regression

As with all statistical methods dealing with time-to-event data, your dependent variable is actually a

pair of variables: